Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: panic when dumping memory profile #12276

Merged
merged 5 commits into from
Sep 14, 2023
Merged

fix: panic when dumping memory profile #12276

merged 5 commits into from
Sep 14, 2023

Conversation

fuyufjh
Copy link
Member

@fuyufjh fuyufjh commented Sep 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Resolves #12254

See also risingwavelabs/jemallocator@c72ce6a

The bug is dump.prefix is write-only. Now use opt.dump_prefix instead.

Also resolves #12186 (comment)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@fuyufjh fuyufjh requested review from yuhao-su and xxchan September 13, 2023 09:21
@fuyufjh fuyufjh requested a review from a team as a code owner September 13, 2023 09:21
@github-actions github-actions bot added the type/fix Bug fix label Sep 13, 2023
@@ -55,7 +55,7 @@ tower = { version = "0.4", features = ["util", "load-shed"] }
tracing = "0.1"

[target.'cfg(target_os = "linux")'.dependencies]
tikv-jemalloc-ctl = { git = "https://github.com/risingwavelabs/jemallocator.git", rev = "b7f9f3" }
tikv-jemalloc-ctl = { git = "https://github.com/risingwavelabs/jemallocator.git", rev = "64a2d9" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use the workspace dependency? 👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tikv-jemalloc-ctl is different from tikv-jemallocator, and it's only used in compute for memory controller policy. Are you sure to move it to workspace dependency as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can never exist two different version of jemalloc ctl
/allocator. So use worksapce makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it can be used in all nodes for profiling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tikv-jemalloc-ctl is different from tikv-jemallocator, and it's only used in compute for memory controller policy.

Oh, I didn't realize this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it can be used in all nodes for profiling

Agree. Let's do it upon that time!

@@ -55,7 +55,7 @@ tower = { version = "0.4", features = ["util", "load-shed"] }
tracing = "0.1"

[target.'cfg(target_os = "linux")'.dependencies]
tikv-jemalloc-ctl = { git = "https://github.com/risingwavelabs/jemallocator.git", rev = "b7f9f3" }
tikv-jemalloc-ctl = { git = "https://github.com/risingwavelabs/jemallocator.git", rev = "64a2d9" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tikv-jemalloc-ctl is different from tikv-jemallocator, and it's only used in compute for memory controller policy.

Oh, I didn't realize this.

@fuyufjh fuyufjh enabled auto-merge September 14, 2023 03:44
@fuyufjh
Copy link
Member Author

fuyufjh commented Sep 14, 2023

Encountered a weird problem: simulation test stack overflowed. @wangrunji0408 is helping with it.

gdb_bt.txt

@wangrunji0408
Copy link
Contributor

Encountered a weird problem: simulation test stack overflowed. @wangrunji0408 is helping with it.

gdb_bt.txt

Found the problem from stack trace:
jemalloc init -> get system time -> intercepted by madsim -> get thread local variable -> malloc -> jemalloc init (infinite loop 💥)

I can't reproduce it on my Mac. Looks like the problem is only on Linux.

@fuyufjh
Copy link
Member Author

fuyufjh commented Sep 14, 2023

It might be related to unprefixed_malloc_on_supported_platforms. Since I moved the dependency to workspace level, this is enabled globally. Let me try to disable it for simulation test.

Looks like the problem is only on Linux.

Yes, because unprefixed jemalloc is not available in MacOS.

@wangrunji0408
Copy link
Contributor

wangrunji0408 commented Sep 14, 2023

It might be related to unprefixed_malloc_on_supported_platforms.

Yes. I can reproduce it on Linux with this feature enabled. And the bug disappeared when I disabled this feature.

I'm curious about what specific things this feature does. 🤯
https://github.com/tikv/jemallocator/blob/0f8983f71813c6e052c6ab445e965c4b0a7e251e/jemalloc-sys/README.md?plain=1#L72

@BugenZhao
Copy link
Member

BugenZhao commented Sep 14, 2023

Since I moved the dependency to workspace level, this is enabled globally.

Oops. 🫨 I'm not sure about this. Is this true?

@fuyufjh fuyufjh added this pull request to the merge queue Sep 14, 2023
@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Merging #12276 (0259125) into main (c07633a) will decrease coverage by 0.01%.
Report is 3 commits behind head on main.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main   #12276      +/-   ##
==========================================
- Coverage   69.97%   69.97%   -0.01%     
==========================================
  Files        1408     1408              
  Lines      235155   235154       -1     
==========================================
- Hits       164550   164549       -1     
  Misses      70605    70605              
Flag Coverage Δ
rust 69.97% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/compute/src/memory_management/policy.rs 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Merged via the queue into main with commit 888f2dd Sep 14, 2023
7 of 8 checks passed
@fuyufjh fuyufjh deleted the eric/fix_12254 branch September 14, 2023 06:54
@fuyufjh
Copy link
Member Author

fuyufjh commented Sep 14, 2023

It might be related to unprefixed_malloc_on_supported_platforms.

Yes. I can reproduce it on Linux with this feature enabled. And the bug disappeared when I disabled this feature.

I'm curious about what specific things this feature does. 🤯 https://github.com/tikv/jemallocator/blob/0f8983f71813c6e052c6ab445e965c4b0a7e251e/jemalloc-sys/README.md?plain=1#L72

It's to solve #9669

Li0k pushed a commit that referenced this pull request Sep 15, 2023
Little-Wallace added a commit that referenced this pull request Sep 18, 2023
commit c82fc9c
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Sep 18 08:37:33 2023 +0000

    chore(deps): Bump chrono from 0.4.30 to 0.4.31 (#12359)

    Signed-off-by: dependabot[bot] <[email protected]>
    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Runji Wang <[email protected]>
    Co-authored-by: TennyZhuang <[email protected]>

commit cbdc1ac
Author: Huangjw <[email protected]>
Date:   Mon Sep 18 16:22:35 2023 +0800

    chore(ci): move release jobs to main-cron pipeline (#12339)

commit b37a19c
Author: Yuhao Su <[email protected]>
Date:   Mon Sep 18 16:18:01 2023 +0800

    feat(dashboard): add memory profiling (#12052)

commit 71d8170
Author: TennyZhuang <[email protected]>
Date:   Mon Sep 18 15:58:26 2023 +0800

    refactor(expr): allow defining functions in frontend (#12287)

    Signed-off-by: TennyZhuang <[email protected]>
    Co-authored-by: zwang28 <[email protected]>
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

commit cedaec9
Author: Dylan <[email protected]>
Date:   Mon Sep 18 15:54:10 2023 +0800

    feat(optimizer): support agg group by simplify rule (#12349)

commit 71d9b0b
Author: Noel Kwan <[email protected]>
Date:   Mon Sep 18 15:32:00 2023 +0800

    feat(meta): update StreamJob status on finish (#12342)

commit 784fe56
Author: zwang28 <[email protected]>
Date:   Mon Sep 18 14:47:49 2023 +0800

    fix(backup): ensure correct delta log order (#12371)

commit 711ecd5
Author: congyi wang <[email protected]>
Date:   Mon Sep 18 14:11:24 2023 +0800

    feat(state_table): add iterator sub range under a certain pk prefix (#12251)

commit 1877aed
Author: xiangjinwu <[email protected]>
Date:   Mon Sep 18 13:49:15 2023 +0800

    refactor(sink): impl SinkFormatter for AppendOnly and Upsert (#12321)

commit f304ed2
Author: xxchan <[email protected]>
Date:   Sun Sep 17 20:20:17 2023 +0800

    revert: Revert "chore: add platforms to hakari (#12333)" (#12363)

commit a975d93
Author: Bohan Zhang <[email protected]>
Date:   Sun Sep 17 19:04:24 2023 +0800

    fix: handle kafka sink message timeout error (#12350)

commit 8ef74ad
Author: Runji Wang <[email protected]>
Date:   Sat Sep 16 12:16:02 2023 +0800

    fix(udf): handle visibility of input chunks in UDTF (#12357)

    Signed-off-by: Runji Wang <[email protected]>

commit 31fdc26
Author: Xu <[email protected]>
Date:   Fri Sep 15 21:01:14 2023 -0400

    feat(expr): switch to `fancy-regex` crate & update the original version (#12329)

    Co-authored-by: xzhseh <[email protected]>

commit 0032145
Author: Runji Wang <[email protected]>
Date:   Fri Sep 15 16:57:25 2023 +0800

    refactor(expr): support variadic function in `#[function]` macro (#12178)

    Signed-off-by: Runji Wang <[email protected]>

commit 467ba4b
Author: stonepage <[email protected]>
Date:   Fri Sep 15 16:28:13 2023 +0800

    fix: stream backfill executor use correct schema (#12314)

    Co-authored-by: Noel Kwan <[email protected]>

commit c443197
Author: Dylan <[email protected]>
Date:   Fri Sep 15 16:22:13 2023 +0800

    feat(optimizer): support correlated column in order by (#12341)

commit 8a36ca3
Author: Noel Kwan <[email protected]>
Date:   Fri Sep 15 16:11:03 2023 +0800

    feat(meta): Add `creating_status` field for stream jobs (#12330)

commit bf5b14e
Author: zwang28 <[email protected]>
Date:   Fri Sep 15 16:06:17 2023 +0800

    chore: lift decoding message size limit for ddl client (#12340)

commit c0060b2
Author: zwang28 <[email protected]>
Date:   Fri Sep 15 15:32:14 2023 +0800

    feat(meta): add hummock config relevant tables to rw_catalog (#12337)

commit 59bb645
Author: xxchan <[email protected]>
Date:   Fri Sep 15 14:54:54 2023 +0800

    chore: add platforms to hakari (#12333)

    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: Runji Wang <[email protected]>

commit 7baa27f
Author: Bugen Zhao <[email protected]>
Date:   Fri Sep 15 14:00:14 2023 +0800

    chore: split full debug info for release build (#12255)

    Signed-off-by: Bugen Zhao <[email protected]>

commit a99e6f3
Author: Richard Chien <[email protected]>
Date:   Fri Sep 15 13:58:19 2023 +0800

    fix(stream): fix pk indices of GroupTopN executors (#12304)

    Signed-off-by: Richard Chien <[email protected]>

commit 43c010e
Author: Croxx <[email protected]>
Date:   Fri Sep 15 11:59:41 2023 +0800

    chore: fix comment and metrics (#12331)

    Signed-off-by: MrCroxx <[email protected]>

commit 214118b
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Sep 15 10:03:14 2023 +0800

    chore(deps): Bump serde_json from 1.0.106 to 1.0.107 (#12322)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 41ebb2a
Author: Xu <[email protected]>
Date:   Thu Sep 14 22:02:08 2023 -0400

    fix(regexp): substraction overflow when incorrectly speicifying `start` (#12325)

commit a566cfe
Author: Xu <[email protected]>
Date:   Thu Sep 14 12:58:35 2023 -0400

    feat(expr): add `array_sum` (#12162)

    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: Runji Wang <[email protected]>

commit 28bbf10
Author: Croxx <[email protected]>
Date:   Fri Sep 15 00:40:27 2023 +0800

    fix(ci): exclude tikv-jemalloc-sys in hakari check (#12320)

    Signed-off-by: MrCroxx <[email protected]>

commit 5aa5a47
Author: zwang28 <[email protected]>
Date:   Thu Sep 14 21:02:01 2023 +0800

    feat(meta): add hummock version relevant tables to rw_catalog (#12309)

commit a740364
Author: Huangjw <[email protected]>
Date:   Thu Sep 14 19:11:04 2023 +0800

    chore(ci): install locales in prebuilt image (#12311)

    Signed-off-by: Bugen Zhao <[email protected]>
    Co-authored-by: Bugen Zhao <[email protected]>

commit 0e72056
Author: StrikeW <[email protected]>
Date:   Thu Sep 14 18:42:34 2023 +0800

    refactor(jdbc-sink): execute statements in batch and set isolation level to RC (#12250)

commit 827ed5e
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:31:41 2023 +0800

    refactor(connector): migrate cdc source metric from connector to compute (#12283)

commit a934185
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:31:04 2023 +0800

    fix(optimizer): relax scan predicate pull up mapping inverse restriction (#12308)

commit db0c099
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:30:28 2023 +0800

    feat(stream): handling watermark in temporal join (#12302)

commit 1ecea63
Author: Bugen Zhao <[email protected]>
Date:   Thu Sep 14 16:43:14 2023 +0800

    refactor(risedev): split the steps for building and running playground (#12279)

    Signed-off-by: Bugen Zhao <[email protected]>
    Co-authored-by: xxchan <[email protected]>

commit ae4b1f8
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Sep 14 08:41:29 2023 +0000

    chore(deps): Bump clap from 4.4.2 to 4.4.3 (#12245)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Bugen Zhao <[email protected]>

commit 7ca370a
Author: Croxx <[email protected]>
Date:   Thu Sep 14 16:24:19 2023 +0800

    feat(refill): fetch whole sst file when refilling (#12265)

    Signed-off-by: MrCroxx <[email protected]>

commit ec129b6
Author: Yuhao Su <[email protected]>
Date:   Thu Sep 14 16:04:37 2023 +0800

    chore: use cfg! to instead of #cfg[] for jemalloc control policy (#12307)

commit 9814af8
Author: Runji Wang <[email protected]>
Date:   Thu Sep 14 14:45:14 2023 +0800

    feat(expr): add `pg_sleep` function (#12294)

    Signed-off-by: Runji Wang <[email protected]>

commit 4525e67
Author: Noel Kwan <[email protected]>
Date:   Thu Sep 14 14:38:03 2023 +0800

    feat(stream): support source throttling (#12295)

commit 5ffd58d
Author: Dylan <[email protected]>
Date:   Thu Sep 14 14:35:03 2023 +0800

    refactor(connector): replace validate source rpc with jni (#12270)

commit 888f2dd
Author: Eric Fu <[email protected]>
Date:   Thu Sep 14 14:32:59 2023 +0800

    fix: panic when dumping memory profile (#12276)

Signed-off-by: Little-Wallace <[email protected]>
fuyufjh added a commit that referenced this pull request Sep 29, 2023
fuyufjh added a commit that referenced this pull request Sep 29, 2023
fuyufjh added a commit that referenced this pull request Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: panic on memory dump
4 participants